skip to main content


Search for: All records

Creators/Authors contains: "Dey, Tamal"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Current complex prediction models are the result of fitting deep neural networks, graph convolutional networks or transducers to a set of training data. A key challenge with these models is that they are highly parameterized, which makes describing and interpreting the prediction strategies difficult. We use topological data analysis to transform these complex prediction models into a simplified topological view of the prediction landscape. The result is a map of the predictions that enables inspection of the model results with more specificity than dimensionality-reduction methods such as tSNE and UMAP. The methods scale up to large datasets across different domains. We present a case study of a transformer-based model previously designed to predict expression levels of a piece of DNA in thousands of genomic tracks. When the model is used to study mutations in theBRCA1gene, our topological analysis shows that it is sensitive to the location of a mutation and the exon structure ofBRCA1in ways that cannot be found with tools based on dimensionality reduction. Moreover, the topological framework offers multiple ways to inspect results, including an error estimate that is more accurate than model uncertainty. Further studies show how these ideas produce useful results in graph-based learning and image classification.

     
    more » « less
    Free, publicly-accessible full text available November 17, 2024
  2. Abstract

    The notion of generalized rank in the context of multiparameter persistence has become an important ingredient for defining interesting homological structures such as generalized persistence diagrams. However, its efficient computation has not yet been studied in the literature. We show that the generalized rank over a finite intervalIof a$$\textbf{Z}^2$$Z2-indexed persistence moduleMis equal to the generalized rank of the zigzag module that is induced on a certain path inItracing mostly its boundary. Hence, we can compute the generalized rank ofMoverIby computing the barcode of the zigzag module obtained by restricting to that path. IfMis the homology of a bifiltrationFof$$t$$tsimplices (while accounting for multi-criticality) andIconsists of$$t$$tpoints, this computation takes$$O(t^\omega )$$O(tω)time where$$\omega \in [2,2.373)$$ω[2,2.373)is the exponent of matrix multiplication. We apply this result to obtain an improved algorithm for the following problem. Given a bifiltration inducing a moduleM, determine whetherMis interval decomposable and, if so, compute all intervals supporting its indecomposable summands.

     
    more » « less
  3. Chambers, Erin W. ; Gudmundsson, Joachim (Ed.)
    We first introduce the notion of meta-rank for a 2-parameter persistence module, an invariant that captures the information behind images of morphisms between 1D slices of the module. We then define the meta-diagram of a 2-parameter persistence module to be the Möbius inversion of the meta-rank, resulting in a function that takes values from signed 1-parameter persistence modules. We show that the meta-rank and meta-diagram contain information equivalent to the rank invariant and the signed barcode. This equivalence leads to computational benefits, as we introduce an algorithm for computing the meta-rank and meta-diagram of a 2-parameter module M indexed by a bifiltration of n simplices in O(n³) time. This implies an improvement upon the existing algorithm for computing the signed barcode, which has O(n⁴) time complexity. This also allows us to improve the existing upper bound on the number of rectangles in the rank decomposition of M from O(n⁴) to O(n³). In addition, we define notions of erosion distance between meta-ranks and between meta-diagrams, and show that under these distances, meta-ranks and meta-diagrams are stable with respect to the interleaving distance. Lastly, the meta-diagram can be visualized in an intuitive fashion as a persistence diagram of diagrams, which generalizes the well-understood persistence diagram in the 1-parameter setting. 
    more » « less
    Free, publicly-accessible full text available June 9, 2024
  4. Abstract Background

    Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes.

    Results

    The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able to comprehend gene expression levels and classify cohorts accordingly.

    Conclusions

    In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.

     
    more » « less
  5. Tomography is a widely used tool for analyzing microstructures in three dimensions (3D). The analysis, however, faces difficulty because the constituent materials produce similar grey-scale values. Sometimes, this prompts the image segmentation process to assign a pixel/voxel to the wrong phase (active material or pore). Consequently, errors are introduced in the microstructure characteristics calculation. In this work, we develop a filtering algorithm called PerSplat based on topological persistence (a technique used in topological data analysis) to improve segmentation quality. One problem faced when evaluating filtering algorithms is that real image data in general are not equipped with the `ground truth' for the microstructure characteristics. For this study, we construct synthetic images for which the ground-truth values are known. On the synthetic images, we compare the pore tortuosity and Minkowski functionals (volume and surface area) computed with our PerSplat filter and other methods such as total variation (TV) and non-local means (NL-means). Moreover, on a real 3D image, we visually compare the segmentation results provided by our filter against TV and NL-means. The experimental results indicate that PerSplat provides a significant improvement in segmentation quality. 
    more » « less
  6. null (Ed.)
    Graphs model real-world circumstances in many applications where they may constantly change to capture the dynamic behavior of the phenomena. Topological persistence which provides a set of birth and death pairs for the topological features is one instrument for analyzing such changing graph data. However, standard persistent homology defined over a growing space cannot always capture such a dynamic process unless shrinking with deletions is also allowed. Hence, zigzag persistence which incorporates both insertions and deletions of simplices is more appropriate in such a setting. Unlike standard persistence which admits nearly linear-time algorithms for graphs, such results for the zigzag version improving the general O(m^ω) time complexity are not known, where ω < 2.37286 is the matrix multiplication exponent. In this paper, we propose algorithms for zigzag persistence on graphs which run in near-linear time. Specifically, given a filtration with m additions and deletions on a graph with n vertices and edges, the algorithm for 0-dimension runs in O(mlog² n+mlog m) time and the algorithm for 1-dimension runs in O(mlog⁴ n) time. The algorithm for 0-dimension draws upon another algorithm designed originally for pairing critical points of Morse functions on 2-manifolds. The algorithm for 1-dimension pairs a negative edge with the earliest positive edge so that a 1-cycle containing both edges resides in all intermediate graphs. Both algorithms achieve the claimed time complexity via dynamic graph data structures proposed by Holm et al. In the end, using Alexander duality, we extend the algorithm for 0-dimension to compute the (p-1)-dimensional zigzag persistence for ℝ^p-embedded complexes in O(mlog² n+mlog m+nlog n) time. 
    more » « less
  7. null (Ed.)
    A combinatorial framework for dynamical systems provides an avenue for connecting classical dynamics with data-oriented, algorithmic methods. Combinatorial vector fields introduced by Forman [R. Forman, 1998; R. Forman, 1998] and their recent generalization to multivector fields [Mrozek, 2017] have provided a starting point for building such a connection. In this work, we strengthen this relationship by placing the Conley index in the persistent homology setting. Conley indices are homological features associated with so-called isolated invariant sets, so a change in the Conley index is a response to perturbation in an underlying multivector field. We show how one can use zigzag persistence to summarize changes to the Conley index, and we develop techniques to capture such changes in the presence of noise. We conclude by developing an algorithm to "track" features in a changing multivector field. 
    more » « less
  8. Persistent cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in the purely topological persistence diagrams (also termed as barcodes). In our earlier work, we showed that computing minimal 1-dimensional persistent cycles (persistent 1-cycles) for finite intervals is NP-hard while the same for infinite intervals is polynomially tractable. In this paper, we address this problem for general dimensions with Z2 coefficients. In addition to proving that it is NP-hard to compute minimal persistent d-cycles (d>1) for both types of intervals given arbitrary simplicial complexes, we identify two interesting cases which are polynomially tractable. These two cases assume the complex to be a certain generalization of manifolds which we term as weak pseudomanifolds. For finite intervals from the d-th persistence diagram of a weak (d+1)-pseudomanifold, we utilize the fact that persistent cycles of such intervals are null-homologous and reduce the problem to a minimal cut problem. Since the same problem for infinite intervals is NP-hard, we further assume the weak (d+1)-pseudomanifold to be embedded in R^{d+1}Rd+1 so that the complex has a natural dual graph structure and the problem reduces to a minimal cut problem. Experiments with both algorithms on scientific data indicate that the minimal persistent cycles capture various significant features of the data. 
    more » « less
  9. Computation of the interleaving distance between persistence modules is a central task in topological data analysis. For $1$-D persistence modules, thanks to the isometry theorem, this can be done by computing the bottleneck distance with known efficient algorithms. The question is open for most $n$-D persistence modules, $n>1$, because of the well recognized complications of the indecomposables. Here, we consider a reasonably complicated class called {\em $2$-D interval decomposable} modules whose indecomposables may have a description of non-constant complexity. We present a polynomial time algorithm to compute the bottleneck distance for these modules from indecomposables, which bounds the interleaving distance from above, and give another algorithm to compute a new distance called {\em dimension distance} that bounds it from below. 
    more » « less